AITopics | video object segmentation

Collaborating Authors

video object segmentation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Associating Objects with Transformers for Video Object Segmentation

Neural Information Processing SystemsApr-24-2026, 19:36:00 GMT

In: ECCV (2020)[34] Yang, Z., Wei, Y., Yang, Y.: Collaborative video object segmentation by multi-scale foreground-background integration.

artificial intelligence, machine learning, segmentation, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Associating Objects with Transformers for Video Object Segmentation

Neural Information Processing SystemsApr-24-2026, 19:35:56 GMT

This paper investigates how to realize better and more efficient embedding learning to tackle the semi-supervised video object segmentation under challenging multi-object scenarios. The state-of-the-art methods learn to decode features with a single positive object and thus have to match and segment each target separately under multi-object scenarios, consuming multiple times computing resources. To solve the problem, we propose an Associating Objects with Transformers (AOT) approach to match and decode multiple objects uniformly. In detail, AOT employs an identification mechanism to associate multiple targets into the same high-dimensional embedding space. Thus, we can simultaneously process multiple objects' matching and segmentation decoding as efficiently as processing a single object.

artificial intelligence, machine learning, segmentation, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.88)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

From ViT Features to Training-free Video Object Segmentation via Streaming-data Mixture Models

Neural Information Processing SystemsFeb-9-2026, 00:08:22 GMT

In the task of semi-supervised video object segmentation, the input is the binary mask of an object in the first frame, and the desired output consists of the corresponding masks of that object in the subsequent frames.

artificial intelligence, machine learning, segmentation, (18 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel (0.05)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement

Neural Information Processing SystemsFeb-7-2026, 19:43:50 GMT

We introduced an adaptive feature bank update scheme to dynamically absorb new features and discard obsolete features.

artificial intelligence, machine learning, segmentation, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > Louisiana > East Baton Rouge Parish > Baton Rouge (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Decoupling Features in Hierarchical Propagation for Video Object Segmentation

Neural Information Processing SystemsDec-25-2025, 15:36:07 GMT

This paper focuses on developing a more effective method of hierarchical propagation for semi-supervised Video Object Segmentation (VOS). Based on vision transformers, the recently-developed Associating Objects with Transformers (AOT) approach introduces hierarchical propagation into VOS and has shown promising results. The hierarchical propagation can gradually propagate information from past frames to the current frame and transfer the current frame feature from object-agnostic to object-specific. However, the increase of object-specific information will inevitably lead to the loss of object-agnostic visual information in deep propagation layers. To solve such a problem and further facilitate the learning of visual embeddings, this paper proposes a Decoupling Features in Hierarchical Propagation (DeAOT) approach.

decoupling feature, hierarchical propagation, video object segmentation, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation

Neural Information Processing SystemsDec-25-2025, 07:12:48 GMT

This paper studies referring video object segmentation (RVOS) by boosting video-level visual-linguistic alignment. Recent approaches model the RVOS task as a sequence prediction problem and perform multi-modal interaction as well as segmentation for each frame separately. However, the lack of a global view of video content leads to difficulties in effectively utilizing inter-frame relationships and understanding textual descriptions of object temporal variations. To address this issue, we propose Semantic-assisted Object Cluster (SOC), which aggregates video content and textual guidance for unified temporal modeling and cross-modal alignment. By associating a group of frame-level object embeddings with language tokens, SOC facilitates joint space learning across modalities and time steps. Moreover, we present multi-modal contrastive supervision to help construct well-aligned joint space at the video level. We conduct extensive experiments on popular RVOS benchmarks, and our method outperforms state-of-the-art competitors on all benchmarks by a remarkable margin. Besides, the emphasis on temporal coherence enhances the segmentation stability and adaptability of our method in processing text expressions with temporal variations.

name change, semantic-assisted object cluster, video object segmentation, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.66)

Add feedback

Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement

Neural Information Processing SystemsDec-23-2025, 20:56:40 GMT

This paper presents a new matching-based framework for semi-supervised video object segmentation (VOS). Recently, state-of-the-art VOS performance has been achieved by matching-based algorithms, in which feature banks are created to store features for region matching and classification. However, how to effectively organize information in the continuously growing feature bank remains under-explored, and this leads to an inefficient design of the bank. We introduced an adaptive feature bank update scheme to dynamically absorb new features and discard obsolete features. We also designed a new confidence loss and a fine-grained segmentation module to enhance the segmentation accuracy in uncertain regions. On public benchmarks, our algorithm outperforms existing state-of-the-arts.

adaptive feature bank, feature bank and uncertain-region refinement, video object segmentation, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.70)

Add feedback

Associating Objects with Transformers for Video Object Segmentation

Neural Information Processing SystemsDec-23-2025, 19:22:38 GMT

associating object, transformer, video object segmentation, (5 more...)

Neural Information Processing Systems

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Vision (0.65)

Add feedback

Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement

Neural Information Processing SystemsOct-2-2025, 11:23:23 GMT

We introduced an adaptive feature bank update scheme to dynamically absorb new features and discard obsolete features.

artificial intelligence, machine learning, segmentation, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > Louisiana > East Baton Rouge Parish > Baton Rouge (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Mitigating Query Selection Bias in Referring Video Object Segmentation

Zhang, Dingwei, Zhang, Dong, Tang, Jinhui

arXiv.org Artificial IntelligenceSep-18-2025

Recently, query-based methods have achieved remarkable performance in Referring Video Object Segmentation (RVOS) by using textual static object queries to drive cross-modal alignment. However, these static queries are easily misled by distractors with similar appearance or motion, resulting in \emph{query selection bias}. To address this issue, we propose Triple Query Former (TQF), which factorizes the referring query into three specialized components: an appearance query for static attributes, an intra-frame interaction query for spatial relations, and an inter-frame motion query for temporal association. Instead of relying solely on textual embeddings, our queries are dynamically constructed by integrating both linguistic cues and visual guidance. Furthermore, we introduce two motion-aware aggregation modules that enhance object token representations: Intra-frame Interaction Aggregation incorporates position-aware interactions among objects within a single frame, while Inter-frame Motion Aggregation leverages trajectory-guided alignment across frames to ensure temporal coherence. Extensive experiments on multiple RVOS benchmarks demonstrate the advantages of TQF and the effectiveness of our structured query design and motion-aware aggregation modules.

machine learning, object-oriented architecture, segmentation, (15 more...)

arXiv.org Artificial Intelligence

2509.13722

Country: Asia > China (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.68)

Add feedback